Complete genomes of three closely related Gram-positive bacteria Streptococcus pyogenes, Strepto coccus pneumoniae and Lactococcus lactis are analyzed for abundances of short DNA sequence motifs (frequent words). The character and extent of frequent words are strikingly different among these genomes. The frequent words of S.pneumoniae split into three categories: parts of the previously characterized RUP and BOX repetitive elements and a 24 bp tandem repeat in the gene SP1772. The most abundant frequent words of L.lactis are all related to the 13 bp motif, WWNTTACTGACRR or its inverted complement YYGTCAGTAANWW. Distributional analysis of this motif, which we called highly repetitive motif (HRM), indicates its possible dual role. Frequent occurrences immediately downstream of genes suggest a possible role in transcription termination whereas spacings of consecutive HRMs consistent with the DNA helical period are indicative of a protein-binding site. Two regions of the L.lactis genome feature an intriguing pattern of several periodically occurring HRMs separated by precisely 59 bp. In a striking contrast to S.pneumoniae and L.lactis, S.pyogenes contains hardly any frequent words.
展开▼